<!DOCTYPE html><html lang="zh-CN"><head><meta charset="UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width, initial-scale=1"><meta name="format-detection" content="telephone=no"><meta name="apple-mobile-web-app-capable" content="yes"><meta name="apple-mobile-web-app-status-bar-style" content="black"><link rel="icon" href="/images/icons/favicon-16x16.png?v=2.8.0" type="image/png" sizes="16x16"><link rel="icon" href="/images/icons/favicon-32x32.png?v=2.8.0" type="image/png" sizes="32x32"><meta name="description" content="Attention Is All You Need         本文链接: https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762 代码仓库: https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;tensor2tensor                       0 Abstract 摘要        主流的序列转录模型都是基于复杂的RNN或CN">
<meta property="og:type" content="article">
<meta property="og:title" content="【看论文】Transformer: Attention Is All You Need">
<meta property="og:url" content="http://hipposox.github.io/2022/10/04/Transformer/index.html">
<meta property="og:site_name" content="Hexo">
<meta property="og:description" content="Attention Is All You Need         本文链接: https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762 代码仓库: https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;tensor2tensor                       0 Abstract 摘要        主流的序列转录模型都是基于复杂的RNN或CN">
<meta property="og:locale" content="zh_CN">
<meta property="og:image" content="https://www.helloimg.com/images/2023/01/12/oGOy6u.png">
<meta property="article:published_time" content="2022-10-04T06:51:35.000Z">
<meta property="article:modified_time" content="2023-01-12T13:48:25.190Z">
<meta property="article:author" content="HippoSoX">
<meta property="article:tag" content="看论文">
<meta property="article:tag" content="Transformer">
<meta name="twitter:card" content="summary">
<meta name="twitter:image" content="https://www.helloimg.com/images/2023/01/12/oGOy6u.png"><title>【看论文】Transformer: Attention Is All You Need | Hexo</title><link ref="canonical" href="http://hipposox.github.io/2022/10/04/Transformer/"><link rel="dns-prefetch" href="https://cdn.jsdelivr.net"><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@fortawesome/fontawesome-free@5.12.1/css/all.min.css" type="text/css"><link rel="stylesheet" href="/css/index.css?v=2.8.0"><link rel="stylesheet" href="css/custom.css"><script>var Stun = window.Stun || {};
var CONFIG = {
  root: '/',
  algolia: undefined,
  assistSearch: undefined,
  fontIcon: {"prompt":{"success":"fas fa-check-circle","info":"fas fa-arrow-circle-right","warning":"fas fa-exclamation-circle","error":"fas fa-times-circle"},"copyBtn":"fas fa-copy"},
  sidebar: {"offsetTop":"20px","tocMaxDepth":6},
  header: {"enable":true,"showOnPost":true,"scrollDownIcon":false},
  postWidget: {"endText":true},
  nightMode: {"enable":true},
  back2top: {"enable":true},
  codeblock: {"style":"default","highlight":"dark","wordWrap":false},
  reward: false,
  fancybox: false,
  zoomImage: {"gapAside":"20px"},
  galleryWaterfall: undefined,
  lazyload: true,
  pjax: undefined,
  externalLink: {"icon":{"enable":true,"name":"fas fa-external-link-alt"}},
  shortcuts: undefined,
  prompt: {"copyButton":"复制","copySuccess":"复制成功","copyError":"复制失败"},
  sourcePath: {"js":"js","css":"css","images":"images"},
};

window.CONFIG = CONFIG;</script><meta name="generator" content="Hexo 5.4.2"></head><body><div class="container" id="container"><header class="header" id="header"><div class="header-inner"><nav class="header-nav header-nav--fixed"><div class="header-nav-inner"><div class="header-nav-menubtn"><i class="fas fa-bars"></i></div><div class="header-nav-menu"><div class="header-nav-menu-item"><a class="header-nav-menu-item__link" href="/"><span class="header-nav-menu-item__icon"><i class="fas fa-home"></i></span><span class="header-nav-menu-item__text">首页</span></a></div><div class="header-nav-menu-item"><a class="header-nav-menu-item__link" href="/archives/"><span class="header-nav-menu-item__icon"><i class="fas fa-folder-open"></i></span><span class="header-nav-menu-item__text">归档</span></a></div><div class="header-nav-menu-item"><a class="header-nav-menu-item__link" href="/categories/"><span class="header-nav-menu-item__icon"><i class="fas fa-layer-group"></i></span><span class="header-nav-menu-item__text">分类</span></a></div><div class="header-nav-menu-item"><a class="header-nav-menu-item__link" href="/tags/"><span class="header-nav-menu-item__icon"><i class="fas fa-tags"></i></span><span class="header-nav-menu-item__text">标签</span></a></div></div><div class="header-nav-mode"><div class="mode"><div class="mode-track"><span class="mode-track-moon"></span><span class="mode-track-sun"></span></div><div class="mode-thumb"></div></div></div></div></nav><div class="header-banner"><div class="header-banner-info"><div class="header-banner-info__title">erocool</div><div class="header-banner-info__subtitle">You know what</div></div></div></div></header><main class="main" id="main"><div class="main-inner"><div class="content-wrap" id="content-wrap"><div class="content" id="content"><!-- Just used to judge whether it is an article page--><div id="is-post"></div><div class="post"><header class="post-header"><h1 class="post-title">【看论文】Transformer: Attention Is All You Need</h1><div class="post-meta"><span class="post-meta-item post-meta-item--createtime"><span class="post-meta-item__icon"><i class="far fa-calendar-plus"></i></span><span class="post-meta-item__info">发表于</span><span class="post-meta-item__value">2022-10-04</span></span><span class="post-meta-item post-meta-item--updatetime"><span class="post-meta-item__icon"><i class="far fa-calendar-check"></i></span><span class="post-meta-item__info">更新于</span><span class="post-meta-item__value">2023-01-12</span></span></div></header><div class="post-body">
        <h1 id="attention-is-all-you-need"   >
          <a href="#attention-is-all-you-need" class="heading-link"><i class="fas fa-link"></i></a><a class="markdownIt-Anchor" href="#attention-is-all-you-need"></a> Attention Is All You Need</h1>
      
<blockquote>
<p>本文链接: <span class="exturl"><a class="exturl__link"   target="_blank" rel="noopener" href="https://arxiv.org/abs/1706.03762" >https://arxiv.org/abs/1706.03762</a><span class="exturl__icon"><i class="fas fa-external-link-alt"></i></span></span><br />
代码仓库: <span class="exturl"><a class="exturl__link"   target="_blank" rel="noopener" href="https://github.com/tensorflow/tensor2tensor" >https://github.com/tensorflow/tensor2tensor</a><span class="exturl__icon"><i class="fas fa-external-link-alt"></i></span></span></p>
</blockquote>

        <h2 id="0-abstract-摘要"   >
          <a href="#0-abstract-摘要" class="heading-link"><i class="fas fa-link"></i></a><a class="markdownIt-Anchor" href="#0-abstract-摘要"></a> 0 Abstract 摘要</h2>
      
<p>主流的序列转录模型都是基于复杂的RNN或CNN，包括编码器和解码器。性能最好的模型还通过注意力机制连接编码器和解码器。我们提出了一种新的简单网络架构，Transformer，它完全基于注意力机制，完全不需要递归和卷积。在两个机器翻译任务上的实验表明，这些模型在质量上具有优越性，同时更具并行性，训练时间明显减少。我们的模型在WMT 2014英语到德语翻译任务中达到28.4 BLEU，比现有最佳结果（包括ensembles）提高了2 BLEU以上。在WMT 2014英法翻译任务中，我们的模型在八个GPU上训练了3.5天之后，建立了一个新的单一型最先进的BLEU分数41.8，这只是文献中最佳车型训练成本的一小部分。我们证明Transformer可以很好地推广到其他任务，它成功地应用于英语选区分析，既有大量的训练数据，也有有限的训练数据。</p>
<span id="more"></span>

        <h2 id="1-introduction-介绍"   >
          <a href="#1-introduction-介绍" class="heading-link"><i class="fas fa-link"></i></a><a class="markdownIt-Anchor" href="#1-introduction-介绍"></a> 1 Introduction 介绍</h2>
      
<p>RNN(递归神经网络)，特别是长短期记忆和门控递归神经网络已被牢固确立为序列建模和转录问题（如语言建模和机器翻译）的最新方法。此后，许多工作都在继续推进递归语言模型和编码器-解码器体系结构的边界。</p>
<p>递归模型通常沿着输入和输出序列的符号位置进行因子计算。将位置与计算时间中的步骤对齐，它们生成一系列隐藏状态ht，作为先前隐藏状态ht-1和位置t的输入的函数。这种固有的顺序性排除了训练示例内的并行化，这在较长的序列长度下变得至关重要，因为内存约束限制了示例之间的批处理。最近的工作通过因子分解技巧和条件计算显著提高了计算效率，同时也提高了后者的模型性能。然而，顺序计算的基本约束仍然存在。</p>
<p>注意力机制已经成为各种任务中引人注目的序列建模和转导模型的组成部分，允许对依赖性进行建模，而不考虑它们在输入或输出序列中的距离。然而，在除少数情况外的所有情况下，这种注意机制都与循环网络一起使用。</p>
<p>在这项工作中，我们提出了Transformer，这是一种避免重复的模型架构，而是完全依赖于注意力机制来绘制输入和输出之间的全局依赖关系。Transformer允许显著提高并行化，并且在8个P100 GPU上训练了12小时后，可以在翻译质量方面达到最新水平。</p>

        <h2 id="2-background-研究背景"   >
          <a href="#2-background-研究背景" class="heading-link"><i class="fas fa-link"></i></a><a class="markdownIt-Anchor" href="#2-background-研究背景"></a> 2 Background 研究背景</h2>
      
<p>减少顺序计算的目标也构成了扩展神经GPU、ByteNet和ConvS2S的基础，所有这些都使用卷积神经网络作为基本构建块，并行计算所有输入和输出位置的隐藏表示。在这些模型中，关联来自两个任意输入或输出位置的信号所需的操作数量随着位置之间的距离而增加，ConvS2S为线性，ByteNet为对数。这使得学习远距离位置之间的相关性变得更加困难。在Transformer中，这被减少到恒定数量的操作，尽管代价是由于平均注意力加权位置而降低了有效分辨率，如第3.2节所述，我们通过多头注意力抵消了这种影响。</p>
<p>自我注意力，有时称为内部注意力，是一种将单个序列的不同位置联系起来以计算序列表示的注意机制。自我注意已成功地应用于各种任务，包括阅读理解、摘要摘要、文本蕴涵和学习任务无关的句子表征。</p>
<p>端到端记忆网络基于循环注意机制，而不是序列对齐的循环，并已被证明在简单的语言问答和语言建模任务中表现良好。</p>
<p>然而，据我们所知，Transformer是第一个完全依靠自我关注来计算其输入和输出表示的转导模型，而不使用序列对齐的RNN或卷积。在接下来的章节中，我们将描述Transformer，激发自我关注，并讨论其相对于[17，18]和[9]等模型的优势。</p>

        <h2 id="3-model-architecture-模型结构"   >
          <a href="#3-model-architecture-模型结构" class="heading-link"><i class="fas fa-link"></i></a><a class="markdownIt-Anchor" href="#3-model-architecture-模型结构"></a> 3 Model Architecture 模型结构</h2>
      
<p>大多数竞争性神经序列转导模型具有编码器-解码器结构。这里，编码器将符号表示的输入序列（x1，…，xn）映射到连续表示的序列z＝（z1，…，zn）。给定z，解码器然后一次生成一个元素的符号输出序列（y1，…，ym）。在每一步，模型都是自回归的，在生成下一步时，使用先前生成的符号作为附加输入。</p>
<p>Transformer遵循这种整体架构，为编码器和解码器使用堆叠的自关注和逐点、完全连接的层，分别如图1的左半部分和右半部分所示。</p>

        <h3 id="31-encoder-and-decoder-stacks-编码器与解码器堆栈"   >
          <a href="#31-encoder-and-decoder-stacks-编码器与解码器堆栈" class="heading-link"><i class="fas fa-link"></i></a><a class="markdownIt-Anchor" href="#31-encoder-and-decoder-stacks-编码器与解码器堆栈"></a> 3.1 Encoder and Decoder Stacks 编码器与解码器堆栈</h3>
      
<p><a target="_blank" rel="noopener" href="https://www.helloimg.com/image/oGOy6u">
        <img   class="lazyload lazyload-gif"
          src="/images/loading.svg" data-src="https://www.helloimg.com/images/2023/01/12/oGOy6u.png"  alt="oGOy6u.png" border="0" />
      </a></p>
<p>编码器：编码器由N=6个相同层的堆栈组成。每个层有两个子层。第一种是多头自我注意机制，第二种是简单的、位置式的全连接前馈网络。我们在两个子层中的每一个周围使用残差连接，然后进行层规范化。也就是说，每个子层的输出是LayerNorm（x+Sublayer（x）），其中Sublayer（x）是子层本身实现的函数。为了促进这些剩余连接，模型中的所有子层以及嵌入层都产生维度dmodel=512的输出。</p>
<p>解码器：解码器也由N=6个相同层的堆栈组成。除了每个编码器层中的两个子层之外，解码器还插入第三个子层，该子层对编码器堆栈的输出执行多头注意力。与编码器类似，我们在每个子层周围使用残余连接，然后进行层规范化。我们还修改了解码器堆栈中的自注意力子层，以防止位置关注后续位置。这种掩蔽，加上输出嵌入偏移一个位置的事实，确保了位置i的预测只能依赖于小于i的位置处的已知输出。</p>

        <h3 id="32-attention-注意力机制"   >
          <a href="#32-attention-注意力机制" class="heading-link"><i class="fas fa-link"></i></a><a class="markdownIt-Anchor" href="#32-attention-注意力机制"></a> 3.2 Attention 注意力机制</h3>
      
<p>注意力函数可以描述为将查询和一组键值对映射到输出，其中查询、键、值和输出都是向量。输出被计算为值的加权和，其中分配给每个值的权重由查询与对应键的兼容性函数计算。</p>

        <h4 id="321-scaled-dot-product-attention-标量点乘注意力机制"   >
          <a href="#321-scaled-dot-product-attention-标量点乘注意力机制" class="heading-link"><i class="fas fa-link"></i></a><a class="markdownIt-Anchor" href="#321-scaled-dot-product-attention-标量点乘注意力机制"></a> 3.2.1 Scaled Dot-Product Attention 标量点乘注意力机制</h4>
      
<p>我们将我们的特别的注意力机制称为“标量点乘注意力机制”。输入由维度dk的查询和键以及维度dv的值组成。我们使用所有键计算查询的点积，将每个键除以√dk，然后应用softmax函数来获得值的权重。</p>
<p>在实践中，我们同时计算一组查询的注意力函数，并将其打包成矩阵Q。键和值也打包成矩阵K和V。我们将输出矩阵计算为：</p>
<p class='katex-block'><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>A</mi><mi>t</mi><mi>t</mi><mi>e</mi><mi>n</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mo stretchy="false">(</mo><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi><mo stretchy="false">)</mo><mo>=</mo><mi>s</mi><mi>o</mi><mi>f</mi><mi>t</mi><mi>m</mi><mi>a</mi><mi>x</mi><mo stretchy="false">(</mo><mfrac><mrow><mi>Q</mi><msup><mi>K</mi><mi>T</mi></msup></mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mfrac><mo stretchy="false">)</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">Attention(Q,K,V)=softmax(\frac{QK^T}{\sqrt{d_k}})V
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">A</span><span class="mord mathnormal">t</span><span class="mord mathnormal">t</span><span class="mord mathnormal">e</span><span class="mord mathnormal">n</span><span class="mord mathnormal">t</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mopen">(</span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.448331em;vertical-align:-0.93em;"></span><span class="mord mathnormal">s</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">t</span><span class="mord mathnormal">m</span><span class="mord mathnormal">a</span><span class="mord mathnormal">x</span><span class="mopen">(</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.5183309999999999em;"><span style="top:-2.25278em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.85722em;"><span class="svg-align" style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord" style="padding-left:0.833em;"><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.81722em;"><span class="pstrut" style="height:3em;"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em;"><svg width='400em' height='1.08em' viewBox='0 0 400000 1080' preserveAspectRatio='xMinYMin slice'><path d='M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z'/></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.18278000000000005em;"><span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathnormal">Q</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07153em;">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.93em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mclose">)</span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span></span></span></span></span></p>
<p>最常用的两种注意力函数是加法注意力和点积（乘法）注意力。点积注意力与我们的算法相同，只是比例因子为1/√dk。加法注意力使用具有单个隐藏层的前馈网络来计算兼容性函数。虽然两者在理论复杂度上相似，但由于可以使用高度优化的矩阵乘法代码来实现，因此在实践中，点积注意力速度更快，空间效率更高。虽然对于较小的dk值，这两种机制的表现相似，但在不缩放较大的dk的情况下，加法注意力优于点积注意力。我们怀疑，对于较大的dk值，点积的大小会变大，从而将softmax函数推入其梯度极小的区域。为了抵消这种影响，我们将点积缩放1/√dk。</p>

        <h4 id="322-multi-head-attention-多头注意力机制"   >
          <a href="#322-multi-head-attention-多头注意力机制" class="heading-link"><i class="fas fa-link"></i></a><a class="markdownIt-Anchor" href="#322-multi-head-attention-多头注意力机制"></a> 3.2.2 Multi-Head Attention 多头注意力机制</h4>
      
<p>与使用dmodel维度键、值和查询执行单个注意力功能不同，我们发现将查询、键和值分别以不同的、学习后的线性投影线性投影h次到dk、dk和dv维度是有益的。然后，在查询、键和值的每个投影版本上，我们并行执行注意力函数，生成dv维输出值。这些值被连接起来并再次投影，得到最终值，如图1所示。</p>
<p>多头注意力允许模型在不同位置共同关注来自不同表示子空间的信息。对于某个单注意力头，平均值会抑制这种情况。</p>

        <h4 id="323-applications-of-attention-in-our-model-注意力机制在我们的模型中的应用"   >
          <a href="#323-applications-of-attention-in-our-model-注意力机制在我们的模型中的应用" class="heading-link"><i class="fas fa-link"></i></a><a class="markdownIt-Anchor" href="#323-applications-of-attention-in-our-model-注意力机制在我们的模型中的应用"></a> 3.2.3 Applications of Attention in our Model 注意力机制在我们的模型中的应用</h4>
      
</div><footer class="post-footer"><div class="post-ending ending"><div class="ending__text">------ 本文结束，感谢您的阅读 ------</div></div><div class="post-copyright copyright"><div class="copyright-author"><span class="copyright-author__name">本文作者: </span><span class="copyright-author__value"><a href="http://hipposox.github.io">HippoSoX</a></span></div><div class="copyright-link"><span class="copyright-link__name">本文链接: </span><span class="copyright-link__value"><a href="http://hipposox.github.io/2022/10/04/Transformer/">http://hipposox.github.io/2022/10/04/Transformer/</a></span></div><div class="copyright-notice"><span class="copyright-notice__name">版权声明: </span><span class="copyright-notice__value">本博客所有文章除特别声明外，均采用 <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en" rel="external nofollow" target="_blank">BY-NC-SA</a> 许可协议。转载请注明出处！</span></div></div><div class="post-tags"><span class="post-tags-item"><span class="post-tags-item__icon"><i class="fas fa-tag"></i></span><a class="post-tags-item__link" href="http://hipposox.github.io/tags/%E7%9C%8B%E8%AE%BA%E6%96%87/">看论文</a></span><span class="post-tags-item"><span class="post-tags-item__icon"><i class="fas fa-tag"></i></span><a class="post-tags-item__link" href="http://hipposox.github.io/tags/Transformer/">Transformer</a></span></div><nav class="post-paginator paginator"><div class="paginator-prev"><a class="paginator-prev__link" href="/2023/01/03/Deeplearning-ai/"><span class="paginator-prev__icon"><i class="fas fa-angle-left"></i></span><span class="paginator-prev__text">Deeplearning.ai</span></a></div><div class="paginator-next"><a class="paginator-next__link" href="/2022/10/03/BEVFormer/"><span class="paginator-prev__text">【看论文】BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers</span><span class="paginator-next__icon"><i class="fas fa-angle-right"></i></span></a></div></nav></footer></div></div></div><div class="sidebar-wrap" id="sidebar-wrap"><aside class="sidebar" id="sidebar"><div class="sidebar-nav"><span class="sidebar-nav-toc current">文章目录</span><span class="sidebar-nav-ov">站点概览</span></div><section class="sidebar-toc"><ol class="toc"><li class="toc-item toc-level-1"><a class="toc-link" href="#attention-is-all-you-need"><span class="toc-number">1.</span> <span class="toc-text">
           Attention Is All You Need</span></a><ol class="toc-child"><li class="toc-item toc-level-2"><a class="toc-link" href="#0-abstract-%E6%91%98%E8%A6%81"><span class="toc-number">1.1.</span> <span class="toc-text">
           0 Abstract 摘要</span></a></li><li class="toc-item toc-level-2"><a class="toc-link" href="#1-introduction-%E4%BB%8B%E7%BB%8D"><span class="toc-number">1.2.</span> <span class="toc-text">
           1 Introduction 介绍</span></a></li><li class="toc-item toc-level-2"><a class="toc-link" href="#2-background-%E7%A0%94%E7%A9%B6%E8%83%8C%E6%99%AF"><span class="toc-number">1.3.</span> <span class="toc-text">
           2 Background 研究背景</span></a></li><li class="toc-item toc-level-2"><a class="toc-link" href="#3-model-architecture-%E6%A8%A1%E5%9E%8B%E7%BB%93%E6%9E%84"><span class="toc-number">1.4.</span> <span class="toc-text">
           3 Model Architecture 模型结构</span></a><ol class="toc-child"><li class="toc-item toc-level-3"><a class="toc-link" href="#31-encoder-and-decoder-stacks-%E7%BC%96%E7%A0%81%E5%99%A8%E4%B8%8E%E8%A7%A3%E7%A0%81%E5%99%A8%E5%A0%86%E6%A0%88"><span class="toc-number">1.4.1.</span> <span class="toc-text">
           3.1 Encoder and Decoder Stacks 编码器与解码器堆栈</span></a></li><li class="toc-item toc-level-3"><a class="toc-link" href="#32-attention-%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6"><span class="toc-number">1.4.2.</span> <span class="toc-text">
           3.2 Attention 注意力机制</span></a><ol class="toc-child"><li class="toc-item toc-level-4"><a class="toc-link" href="#321-scaled-dot-product-attention-%E6%A0%87%E9%87%8F%E7%82%B9%E4%B9%98%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6"><span class="toc-number">1.4.2.1.</span> <span class="toc-text">
           3.2.1 Scaled Dot-Product Attention 标量点乘注意力机制</span></a></li><li class="toc-item toc-level-4"><a class="toc-link" href="#322-multi-head-attention-%E5%A4%9A%E5%A4%B4%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6"><span class="toc-number">1.4.2.2.</span> <span class="toc-text">
           3.2.2 Multi-Head Attention 多头注意力机制</span></a></li><li class="toc-item toc-level-4"><a class="toc-link" href="#323-applications-of-attention-in-our-model-%E6%B3%A8%E6%84%8F%E5%8A%9B%E6%9C%BA%E5%88%B6%E5%9C%A8%E6%88%91%E4%BB%AC%E7%9A%84%E6%A8%A1%E5%9E%8B%E4%B8%AD%E7%9A%84%E5%BA%94%E7%94%A8"><span class="toc-number">1.4.2.3.</span> <span class="toc-text">
           3.2.3 Applications of Attention in our Model 注意力机制在我们的模型中的应用</span></a></li></ol></li></ol></li></ol></li></ol></section><!-- ov = overview--><section class="sidebar-ov hide"><div class="sidebar-ov-author"><div class="sidebar-ov-author__avatar"><img class="sidebar-ov-author__avatar_img" src="/images/icons/stun-logo.svg" alt="avatar"></div><p class="sidebar-ov-author__text">motto</p></div><div class="sidebar-ov-state"><a class="sidebar-ov-state-item sidebar-ov-state-item--posts" href="/archives/"><div class="sidebar-ov-state-item__count">19</div><div class="sidebar-ov-state-item__name">归档</div></a><a class="sidebar-ov-state-item sidebar-ov-state-item--categories" href="/categories/"><div class="sidebar-ov-state-item__count">6</div><div class="sidebar-ov-state-item__name">分类</div></a><a class="sidebar-ov-state-item sidebar-ov-state-item--tags" href="/tags/"><div class="sidebar-ov-state-item__count">14</div><div class="sidebar-ov-state-item__name">标签</div></a></div><div class="sidebar-ov-cc"><a href="https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en" target="_blank" rel="noopener" data-popover="知识共享许可协议" data-popover-pos="up"><img src="/images/cc-by-nc-sa.svg"></a></div></section><div class="sidebar-reading"><div class="sidebar-reading-info"><span class="sidebar-reading-info__text">你已阅读了 </span><span class="sidebar-reading-info__num">0</span><span class="sidebar-reading-info__perc">%</span></div><div class="sidebar-reading-line"></div></div><iframe frameborder="no" border="0" marginwidth="0" marginheight="0" width=330 height=86 src="//music.163.com/outchain/player?type=2&id=1449790718&auto=1&height=66"></iframe></aside></div><div class="clearfix"></div></div></main><footer class="footer" id="footer"><div class="footer-inner"><div><span>Copyright © 2023</span><span class="footer__icon"><i class="fas fa-heart"></i></span><span>HippoSoX</span></div><div><span>由 <a href="http://hexo.io/" title="Hexo" target="_blank" rel="noopener">Hexo</a> 强力驱动</span><span> v5.4.2</span><span class="footer__devider">|</span><span>主题 - <a href="https://github.com/liuyib/hexo-theme-stun/" title="Stun" target="_blank" rel="noopener">Stun</a></span><span> v2.8.0</span></div></div></footer><div class="loading-bar" id="loading-bar"><div class="loading-bar__progress"></div></div><div class="back2top" id="back2top"><span class="back2top__icon"><i class="fas fa-rocket"></i></span></div></div><script src="https://cdn.jsdelivr.net/npm/jquery@v3.4.1/dist/jquery.min.js"></script><script src="https://cdn.jsdelivr.net/npm/velocity-animate@1.5.2/velocity.min.js"></script><script src="https://cdn.jsdelivr.net/npm/velocity-animate@1.5.2/velocity.ui.min.js"></script><script src="https://cdn.jsdelivr.net/npm/lazyload@2.0.0-rc.2/lazyload.min.js"></script><link href="https://cdn.jsdelivr.net/npm/katex@0.10.2/dist/katex.min.css" rel="stylesheet" type="text/css"><link href="https://cdn.jsdelivr.net/npm/katex@0.10.2/dist/contrib/copy-tex.css" rel="stylesheet" type="text/css"><script src="https://cdn.jsdelivr.net/npm/katex@0.10.2/dist/contrib/copy-tex.min.js"></script><script src="/js/utils.js?v=2.8.0"></script><script src="/js/stun-boot.js?v=2.8.0"></script><script src="/js/scroll.js?v=2.8.0"></script><script src="/js/header.js?v=2.8.0"></script><script src="/js/sidebar.js?v=2.8.0"></script></body></html>